Hunting for Entailing Pairs in the Penn Discourse Treebank

نویسندگان

  • Sara Tonelli
  • Elena Cabrio
چکیده

Given the growing amount of resources developed in the NLP community, it is crucial to exploit as much as possible annotated data and tools across different research domains. Past works on discourse analysis have been conducted in parallel with research on semantic inference and, although the two fields of study are intertwined, there have been only few initiatives to put them into relation. Our work addresses the issue of interoperability by investigating the connection between implicit Restatement relations in the Penn Discourse Treebank (PDTB) and Textual Entailment. We compare the performance of two TE systems on the Restatement pairs and we argue that TE is a subclass of Restatement through a manual validation of the pairs. Furthermore, we observe that entailing pairs extracted from the PDTB add interesting and additional levels of complexity to TE, since inference relation relies less on lexical-syntactic variations, and more on reasoning. TITLE AND ABSTRACT IN ITALIAN A caccia di inferenze semantiche nel Penn Discourse Tree Bank Data l’ingente quantità di risorse sviluppate in trattamento automatico del linguaggio, l’importanza di sfruttare anche in altri campi di ricerca i dati annotati e gli strumenti implementati è diventata fondamentale. In passato, lavori sull’analisi del discorso sono stati condotti parallelamente alla ricerca sulle inferenze semantiche, ma sebbene i due campi di studio presentino numerosi punti in comune, non ci sono state iniziative per avvicinarli. Questo lavoro affronta la questione dell’interoperabilità investigando le connessioni tra la relazione implicita di Restatement nel Penn Discourse Treebank (PDTB) e Textual Entailment (TE) (implicazione semantica). Comparando i risultati ottenuti da due sistemi che riconoscono automaticamente la relazione di implicazione, e dall’annotazione manuale di un sottoinsieme di coppie, mostriamo come il TE sia riconducibile ad una sottocategoria di Restatement. Inoltre, osserviamo che le coppie in relazione di implicazione estratte dal PDTB mostrano un livello di complessità superiore rispetto a quelle considerate dai sistemi attuali, in quanto la relazione di inferenza si basa meno su variazioni lessico-sintattiche, e più sul ragionamento.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Searching in the Penn Discourse Treebank Using the PML-Tree Query

The PML-Tree Query is a general, powerful and user-friendly system for querying richly linguistically annotated treebanks. The present paper shows how the PML-Tree Query can be used for searching for discourse relations in the Penn Discourse Treebank 2.0 mapped onto the syntactic annotation of the Penn Treebank.

متن کامل

Annotation And Data Mining Of The Penn Discourse TreeBank

The Penn Discourse TreeBank (PDTB) is a new resource built on top of the Penn Wall Street Journal corpus, in which discourse connectives are annotated along with their arguments. Its use of standoff annotation allows integration with a stand-off version of the Penn TreeBank (syntactic structure) and PropBank (verbs and their arguments), which adds value for both linguistic discovery and discour...

متن کامل

Automatic Discourse Segmentation using Neural Networks

In example (1), a sentence from a Wall Street Journal article taken from the Penn TreeBank corpus is further segmented into four EDUs, (1a), (1b), (1c) and (1d) (RST, 2002). Discourse segmentation, clearly, is not as easy as sentence boundary detection. The lack of consensus with regards to what constitutes an elementary discourse unit adds to the difficulty. Building a rule based discourse seg...

متن کامل

A Short Introduction to the Penn Discourse TreeBank

Taking discourse connectives to be the predicates of binary discourse relations, the goal of Penn Discourse Treebank (PDTB) is to annotate the million word WSJ corpus in the Penn TreeBank with each of its discourse connectives and their arguments. The paper describes the linguistic observations and ideas that led to the PDTB, the decisions that shaped its content and the tools used in its devel...

متن کامل

Towards interoperable discourse annotation. Discourse features in the Ontologies of Linguistic Annotation

This paper describes the extension of the Ontologies of Linguistic Annotation (OLiA) with respect to discourse features. The OLiA ontologies provide a a terminology repository that can be employed to facilitate the conceptual (semantic) interoperability of annotations of discourse phenomena as found in the most important corpora available to the community, including OntoNotes, the RST Discourse...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012